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Foreword 



rd , 



This Technical Specification has been produced by the 3 Generation Partnership Project (3GPP). 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 

The 3GPP transparent end-to-end packet-switched streaming service (PSS) specification consists of two 3G TSs; 
3GPP TS 26.233 [2] and the present document. The first TS provides an overview of the 3GPP PSS and the present 
document the details of protocol and codecs used by the service. 



Introduction 



Streaming refers to the ability of an application to play synchronised media streams like audio and video streams in a 
continuous way while those streams are being transmitted to the client over a data network. 

Applications, which can be built on top of streaming services, can be classified into on-demand and live information 
delivery applications. Examples of the first category are music and news-on-demand applications. Live delivery of radio 
and television programs are examples of the second category. 

The 3GPP PSS provides a framework for Internet Protocol (IP) based streaming applications in 3G networks. 
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Scope 



The present document specifies the protocols and codecs for the PSS within the 3GPP system. Protocols for control 
signalling, scene description, media transport and media encapsulations are specified. Codecs for speech, audio, video, 
still images, bitmap graphics, and text are specified. 

The present document is applicable to IP based packet switched networks. 



2 References 

The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including 
a GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 
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Annex B: AMR-WB RTP payload and MIME type registration". 

[13] IETF RFC 3016: "RTP Payload Format for MPEG-4 Audio/Visual Streams", Kikuchi Y. et al., 

November 2000. 
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[17] IETF RFC 2616: "Hypertext Transfer Protocol - HTTP/1.1", Fielding R. et al., June 1999. 

[18] 3GPP TS 26.071: "Mandatory Speech Codec speech processing functions; AMR Speech Codec; 

General description". 

[19] 3GPP TS 26.101: "Mandatory Speech Codec speech processing functions; AMR Speech Codec; 

Frame Structure". 
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3 Definitions and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply: 

continuous media: media with an inherent notion of time, in the present document speech, audio and video 

discrete media: media that itself does not contain an element of time, in the present document all media not defined as 
continuous media 
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presentation description: contains information about one or more media streams within a presentation, such as the set 
of encodings, network addresses and information about the content 

PSS client: client for the 3GPP packet based streaming service based on the IETF RTSP/SDP and/or HTTP standards, 
with possible additional 3GPP requirements according to the present document 

PSS server: server for the 3GPP packet based streaming service based on the IETF RTSP/SDP and/or HTTP standards, 
with possible additional 3GPP requirements according to the present document 

scene description: description of the spatial layout and temporal behaviour of a presentation, it can also contain 
hyperlinks 

3.2 Abbreviations 

For the purposes of the present document, the abbreviations given in 3GPP TR 21.905 [3] and the following apply. 

AAC Advanced Audio Coding 

BIFS Binary Format for Scene description 

DCT Discrete Cosine Transform 

GIF Graphics Interchange Format 

HTML Hyper Text Markup Language 

ITU-T International Telecommunications Union - Telecommunications 

JFIF JPEG File Interchange Format 

MIME Multipurpose Internet Mail Extensions 

MMS Multimedia Messaging Service 

MP4 MPEG-4 file format 

PSS Packet-switched Streaming Service 

QCIF Quarter Common Intermediate Format 

RTCP RTP Control Protocol 

RTP Real-time Transport Protocol 

RTSP Real-Time Streaming Protocol 

SDP Session Description Protocol 

SMIL Synchronised Multimedia Integration Language 

UCS-2 Universal Character Set (the two octet form) 

UTF-8 Unicode Transformation Format (the 8-bit form) 

W3C WWW Consortium 

WML Wireless Markup Language 

XHTML extensible Hyper Text Markup Language 

XML extensible Markup Language 
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Scope of PSS 

NOTE: Dashed components are not specified for the simple PSS. 

Figure 1 : Functional components of a PSS client 

Figure 1 shows the functional components of a PSS client. Figure 2 gives an overview of the protocol stack used in a 
PSS client and also shows a more detailed view of the packet based network interface. The functional components can 
be divided into control, scene description, media codecs and the transport of media and control data. TS 26.233 [2] 
defines the simple and extended PSS. Dashed functional components in figure 1 are not specified for the simple PSS. 
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The control related elements are session establishment, capability exchange and session control (see clause 5). 

Session establishment refers to methods to invoke a PSS session from a browser or directly by entering an URL 
in the terminal's user interface. 

Capability exchange enables choice or adaptation of media streams depending on different terminal capabilities. 

Session control deals with the set-up of the individual media streams between a PSS client and one or several 
PSS servers. It also enables control of the individual media streams by the user. It may involve VCR-like 
presentation control functions like start, pause, fast forward and stop of a media presentation. 

The scene description consists of spatial layout and a description of the temporal relation between different media that 
is included in the media presentation. The first gives the layout of different media components on the screen and the 
latter controls the synchronisation of the different media (see clause 8). 

The PSS includes media codecs for video, still images, bitmap graphics, text, audio, and speech (see clause 7). 

Transport of media and control data consists of the encapsulation of the coded media and control data in a transport 
protocol (see clause 6). This is shown in figure 1 as the "packet based network interface" and displayed in more detail in 
the protocol stack of figure 2. 
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Figure 2: Overview of thie protocol stack 



Protocols 



5.1 



Session establishment 



Session establishment refers to the method by which a PSS client obtains the initial session description. The initial 
session description can e.g. be a presentation description, a scene description or just an URL to the content. 

A PSS client shall support initial session descriptions specified in one of the following formats: SMIL, SDP, or plain 
RTSP URL. 

In addition to rtsp:// the PSS client shall support URLs [4] to valid initial session descriptions starting with file:// (for 
locally stored files) and http:// (for presentation descriptions or scene descriptions delivered via HTTP). 

Examples for valid inputs to a PSS client are: file://temp/morning_news.smil, http://mediaportal/morning news.sdp , 
and rtsp://mediaportal/morning_news. 
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URLs can be made available to a PSS client in many different ways. It is out of the scope of this recommendation to 
mandate any specific mechanism. However, an application using the 3GPP PSS shall at least support URLs of the 
above type, specified or selected by the user. 

The preferred way would be to embed URLs to initial session descriptions within HTML or WML pages. Browser 
applications that support the HTTP protocol could then download the initial session description and pass the content to 
the PSS client for further processing. How exactly this is done is an implementation specific issue and out of the scope 
of this recommendation. 



5.2 Capability exchange 



No explicit capability exchange protocol is specified for the simple PSS.. Instead it is assumed that the user is aware of 
that the content he/she is about to stream fits the capabilities, e.g. screen size, of the particular device used. Protocols for 
capability exchange can be specified for the extended PSS. 

5.3 Session set-up and control 

5.3.1 General 

Continuous media is media that have an intrinsic time line. Discrete media on the other does not it self contain an 
element of time. In this specification speech, audio and video belongs to first category and still images and text to the 
latter one. Bitmap graphics can fall into both groups, but is in this specification defined to be discrete media. 

Streaming of continuous media using RTP/UDP/IP (see clause 6.2) requires a session control protocol to set-up and 
control of the individual media streams. For the transport of discrete media this specification adopts the use of 
HTTP/TCP/IP (see clause 6.3). In this case there is no need for a separate session set-up and control protocol since this 
is built into HTTP. This clause describes session set-up and control of continuous media. 

5.3.2 RTSP 

RTSP [5] shall be used for session set-up and session control. PSS clients and servers shall follow the rules for minimal 
on-demand playback RTSP implementations in appendix D of [5]. In addition to this: 

PSS servers and clients shall implement the DESCRIBE method (see clause 10.2 in [5]); 

PSS servers and clients shall implement the Range header field (see clause 12.29 in [5]). 

5.3.3 SDP 

RTSP requires a presentation description. SDP shall be used as the format of the presentation description for both PSS 
clients and servers. PSS servers shall provide and clients interpret the SDP syntax according to the SDP specification 
[6] and appendix C of [5]. The SDP delivered to the PSS client shall declare the media types to be used in the session 
using a codec specific MIME media type for each media. MIME media types to be used in the SDP file are described in 
clause 5.4 of the present document. 

The SDP [6] specification requires certain fields to always be included in an SDP file. Apart from this a PSS server 
shall always include the following fields in the SDP: 

"a=control:" according to clauses C.1.1, C.2 and C.3 in [5]; 

"a=range:" according to clause C.1.5 in [5]; 

"a=rtpmap:" according to clause 6 in [6]; 

"a=fmtp:" according to clause 6 in [6]. 

The bandwidth field in SDP can be used to indicate to the PSS client the amount of bandwidth that is required for the 
session and the individual media in the presentation. Therefore, a PSS server should include the "b=AS:" field in the 
SDP (both on the session and media level) and a PSS client shall be able to interpret this field. The bandwidth value 
shall indicate maximum net rates of media streams without lower level packetisation overhead 
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5.4 MIME media types 

For continuous media (speech, audio and video) the following MIME media types shall be used: 
AMR narrow band speech codec (see clause 7.2) MIME media type as defined in [11]; 
AMR wide band speech codec (see clause 7.2) MIME media type as defined in [12]; 

- MPEG-4 AAC audio codec (see clause 7.3) MIME media type as defined in RFC 3016 [13]. 

- MPEG-4 video codec (see clause 7.4) MIME media type as defined in RFC 3016 [13]; 

H.263 [22] video codec (see clause 7.4) MIME media type as defined in annex C, clause C.l of the present 
document. 

MIME media types for JPEG, GIF and XHTML can be used both in the "Content-type" field in HTTP and in the "type" 
attribute in SMIL 2.0. The following MIME media types shall be used for these media: 

JPEG (see clause 7.5) MIME media type as defined in [15]; 

GIF (see clause 7.6) MIME media type as defined in [15]; 

XHTML (see clause 7.8) MIME media type as defined in annex C clause C.2 of the present document. 

MIME media type used for SMIL files shall be according to [31] and for SDP files according to [6]. 

6 Data transport 

6.1 Packet based network interface 

PSS clients and servers shall support an IP-based network interface for the transport of session control and media data. 
Control and media data are sent using TCP/IP [8] and UDP/IP [7]. An overview of the protocol stack can be found in 
figure 2 of the present document. 

6.2 RTP over UDP/IP 

The IETF RTP [9] and [10] provides a means for sending real-time or streaming data over UDP (see [7]). The encoded 
media is encapsulated in the RTP packets with media specific RTP payload formats. RTP payload formats are defined 
by IETF. RTP also provides a protocol called RTCP (see clause 6 in [9]) for feedback about the transmission quality. 

RTP/UDP/IP transport of continuous media (speech , audio and video) shall be supported. 

For RTP/UDP/IP transport of continuous media the following RTP payload formats shall be used: 

AMR narrow band speech codec (see clause 7.2) RTP payload format according to [11]; 

AMR wide band speech codec (see clause 7.2) RTP payload format according to [12]; 

- MPEG-4 AAC audio codec (see clause 7.3) RTP payload format according to RFC 3016 [13]; 
MPEG-4 video codec (see clause 7.4) RTP payload format according to RFC 3016 [13]; 
H.263 [22] video codec (see clause 7.4) RTP payload format according to RFC 2429 [14]; 

6.3 HTTP over TCP/IP 

The IETF TCP provides reliable transport of data over IP networks, but with no delay guarantees. It is the preferred way 
for sending the scene description, text, bitmap graphics and still images. There is also need for an application protocol 
to control the transfer. The IETF HTTP [17] provides this functionality. 
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HTTP/TCP/IP transport shall be supported for: 
still images (see clause 7.5); 
bitmap graphics (see clause 7.6); 
text (see clause 7.8); 
scene description (see clause 8); 
presentation description (see clause 5.3.3). 

6.4 Transport of RTSP 

Transport of RTSP shall be supported according to RFC 2326 [5]. 



7 



Codecs 



7.1 



General 



For PSS offering a particular media type, media codecs are specified in the following clauses. 



7.2 Speech 



The AMR codec shall be supported for narrow-band speech [18]. The AMR wideband speech codec [20] shall be 
supported when wideband speech working at 16 kHz sampling frequency is supported. 



7.3 



Audio 



MPEG-4 AAC Low Complexity object type [21] should be supported. The maximum sampling rate to be supported by 
the decoder is 48 kHz. The channel configurations to be supported are mono (1/0) and stereo (2/0). In addition, the 
MPEG-4 AAC Long Term Prediction object type may be supported. 



7.4 



Video 



ITU-T Recommendation H.263 [22] profile level 10 shall be supported. This is the mandatory video codec for the 
PSS. In addition, PSS should support: 

- H.263 [23] Profile 3 Level 10; 

- MPEG-4 Visual Simple Profile Level 0, [24] and [25]. 

These two video codecs are optional to implement. 

NOTE: ITU-T Recommendation H.263 [22] baseline has been mandated to ensure that video-enabled PSS 
support a minimum baseline video capability and interoperability can be guaranteed (an H.263 [22] 
baseline bitstream can be decoded by both H.263 [22] and MPEG-4 decoders). It also provides a simple 
upgrade path for mandating more advanced codecs in the future (from both the ITU-T and ISO MPEG). 
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7.5 Still images 



ISO/IEC JPEG [26] together with JFIF [27] shall be supported. The support for ISO/IEC JPEG only apply to the 
following two modes: 

baseline DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOFO' in [26]; 

progressive DCT, non-differential, Huffman coding, as defined in table B.l, symbol 'SOF2' [26]. 

7.6 Bitmap graphics 

The following bitmap graphics codecs should be supported: 

- GIF87a, [32]; 

- GIF89a, [33]. 



7.7 Vector graphics 



No vector graphics codec is specified for the simple PSS. For the extended PSS mandatory and/or optional vector 
graphics codecs can be specified. 

7.8 Text 

The text codec is intended to enable formatted text in a SMIL presentation. A PSS client shall support 

text formatted according to XHTML Basic [28]; 

- rendering a SMIL presentation where text is referenced with the SMIL 2.0 "text" element together with the SMIL 
2.0 "src" attribute. 

The following character encoding shall be supported: 

- UTF-8, [29]; 

- UCS-2, [30]. 

NOTE: Since both SMIL and XHTML are XML based languages it would be possible to define a SMIL plus 

XHTML profile. In contrast to the present defined PSS4 SMIL Language Profile that only contain SMIL 
modules, such a profile would also contain XHTML modules. No combined SMIL and XHTML profile is 
specified for PSS. Rendering of such documents is out of the scope of the present document. 



8 Scene description 

8.1 General 

The 3GPPPSS use a subset of SMIL 2.0 [31] as format of the scene description. PSS clients and servers with support 
for scene descriptions shall support the 3GPP PSS4 SMIL Language Profile defined in clause 8.2. This profile is a 
subset of the SMIL 2.0 Language Profile, but a superset of the SMIL 2.0 Basic Language Profile. The present document 
also includes an informative Annex B that provides guidelines for SMIL content authors. 

NOTE: The interpretation of this is not that all streaming sessions are required to use SMIL. For some types of 
sessions, e.g. consisting of one single continuous media or two media synchronised by using RTP 
timestamps, SMIL may not be needed. 
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8.2 3GPP PSS4 SMIL Language Profile 

8.2.1 Introduction 

3GPP PSS4 SMIL is a markup language based on SMIL Basic [31] and SMIL Scalability Framework. 

3GPP PSS4 SMIL shall consist of the modules required by SMIL Basic Profile (and SMIL 2.0 Host Language 
Conformance) and additional MediaAccessibility, MediaDescription, MediaClipping, Metalnformation, 
PrefetchControl and EventTiming modules. All in all the following modules are included: 

SMIL 2.0 Content Control Modules - BasicContentControl, SkipContentControl and PrefetchControl 

- SMIL 2.0 Layout Module -- BasicLayout 

- SMIL 2.0 Linking Module — BasicLinking 

SMIL 2.0 Media Object Modules - BasicMedia, MediaClipping, MediaAccessibility and MediaDescription 

- SMIL 2.0 Metalnformation Module — Metalnformation 

- SMIL 2.0 Structure Module -- Structure 

SMIL 2.0 Timing and Synchronization Modules — BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming 

8.2.2 Document Conformance 

A conforming 3GPP PSS4 SMIL document shall be a conforming SMIL 2.0 document. 

All 3GPP PSS4 SMIL documents use SMIL 2.0 namespace. 

<smil xmlns=" http: //www. w3 . orq/2001/SMIL20/Lanquaqe "> 

3GPP PSS4 SMIL documents may declare requirements using systemRequired attribute: 

EXAMPLE!: <smil xmlns=" http : / / www ■w3.orq/2001/ SMIL2 /Language " 

xmlns : EventTiming= " http ://w w w . w3.org/2000/SMIL20/CR/EventTiming " 
systemRequired="EventTiming"> 

Namespace URI http://www.3gpp.org/SMIL20/PSS4/ identifies the 3GPP PSS4 SMIL. Authors can use this URI to 
indicate requirement for exact 3GPP PSS4 SMIL semantics for a document or a subpart of a document: 

EXAMPLE 2: <smil xmlns=" http : / / www ■w3.org/2001/ SMIL2 /Language " 
xmlns:pss4=" http://www.3gpp.org/SMIL20/PSS4/ " 
systemReqzuired="pss4"> 

The content authors generally should choose not to include the PSS requirement in the document unless the SMIL 
document relies on PSS specific semantics that are not part of the W3C SMIL. The reason for this is that SMIL players 
that are not conforming 3GPP PSS user agents may not recognize the PSS4 URI and thus refuse to play the document. 

8.2.3 User Agent Conformance 

A conforming 3GPP PSS4 SMIL user agent shall be a conforming SMIL Basic User Agent. 

A conforming user agent shall implement the semantics of the language as described in this document. 

A conforming user agent shall recognize the URJs of all included SMIL 2.0 modules. It shall also recognize URI 
http://www.3gpp.org/SMIL20/PSS4/ as referring to all modules and semantics of 3GPP SMIL language. 
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8.2.4 3GPP SMIL Language Profile 



3GPP PSS4 SMIL is based on SMIL 2.0 Basic language profile [31]. This chapter defines the content model and 
integration semantics of the included modules where they differ from those defined by SMIL Basic. 

8.2.4.1 Content Control Modules 

3GPP PSS4 SMIL shall include the content control functionality of the BasicContentControl, SkipContentControl and 
PrefetchControl modules of SMIL 2.0. PrefetchControl is not part of SMIL Basic and is an additional module in this 
profile. 

All BasicContentControl attributes listed in the module specification shall be supported. 

NOTE: The SMIL specification [31] defines that all functionality of PrefetchControl module is optional. This 
mean that even that PrefetchControl is mandatory user agents may implement semantics of 
PrefetchControl module only partially or not to implement them at all. PrefetchControl module adds the 
prefetch element to the content model of SMIL Basic body, switch, par and seq elements. 

The prefetch element has the attributes defined by the PrefetchControl module (mediaSize, mediaXime and 
bandwidth), the src attribute, the BasicContentControl attributes and the skip-content attribute. 

8.2.4.2 Layout Module 

3GPP PSS4 SMIL shall use the BasicLayout module of SMIL 2.0 for spatial layout. The module is part of SMIL Basic. 
Default values of the width and height attributes for root-layout shall be the dimensions of the device display area. 

8.2.4.3 Linking Module 

3GPP PSS4 SMIL shall use the SMIL 2.0 BasicLinking module for providing hyperlinks between documents and 
document fragments. This module is from SMIL Basic. 

When linking to destinations outside the current document, implementations may ignore values "play" and "pause" of 
the 'sourcePlaystate' attribute and values "new" and "pause" of the 'show' attribute, instead using the semantics of values 
"stop" and "replace" respectively. When the values of 'sourcePlaystate' and 'show' are ignored the player may also 
ignore the 'sourceLevel' attribute since it is of no use then 

8.2.4.4 Media Object Modules 

3GPP PSS4 SMIL shall include the media elements from the SMIL 2.0 BasicMedia module and attributes from the 
Media Accessibility, MediaDescription and MediaClipping modules. MediaAccessibility, MediaDescription and 
MediaClipping modules are additions in this profile to the SMIL Basic. 

See clause 5.4 for what are the mandatory and optional MIME types a 3GPP PSS4 SMIL player needs to support. 

MediaClipping module adds to the profile the ability to address sub-clips of continuous media. MediaClipping module 
adds 'clipBegin' and 'clipEnd'(and for compatibility 'clip-begin' and 'clip-end') attributes to all media elements. 

MediaAccessibility module provides basic accessibility support for media elements. New attributes 'alt', 'longdesc' and 
'readlndex' are added to all media elements by this module. MediaDescription module is included by the 
MediaAccessibility module and adds 'abstract', 'author' and 'copyright' attributes to media elements. 

8.2.4.5 Metainformation Module 

Metalnformation module of SMIL 2.0 shall be included to the profile. This module is addition in this profile to the 
SMIL Basic and provides a way to include descriptive information about the document content into the document. 

This module adds meta and metadata elements to the content model of SMIL Basic head element. 

8.2.4.6 Structure Module 

The Structure module defines the top-level structure of the document. It's included by SMIL Basic. 
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8.2.4.7 Timing and Synchronization modules 

The timing modules included in the 3GPP SMIL shall be BasicInlineTiming, MinMaxTiming, BasicTimeContainers, 
RepeatTiming and EventTiming. The EventTiming module is an addition in this profile to the SMIL Basic. 

For 'begin' and 'end' attributes either single offset-value or single event-value shall be allowed. Offsets shall not be 
supported with event-values. 

Event timing attributes that reference invalid IDs (for example elements that have been removed by the content control) 
shall be treated as being indefinite. 

Supported event names and semantics shall be as defined by the SMIL 2.0 Language Profile. All user agents shall be 
able to raise the the following event types: 

activateEvent; 

beginEvent; 

endEvent. 
Following SMIL 2.0 Language event types should be supported: 

focusInEvent; 
- focusOutEvent; 

inBoundsEvent; 

outBoundsEvent; 

repeatEvent. 
User agents shall ignore unknown event types and not treat them as errors. 
Events do not bubble and shall be delivered to the associated media or timed elements only. 

8.2.5 Content Model 

This table shows the full content model and attributes of the 3GPP PSS4 SMIL profile. The attribute collections used 
are defined by SMIL Basic ([31], SMIL Host Language Conformance requirements, chapter 2.4). Changes to the SMIL 
Basic are shown in bold. 
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Element 




Elements 


Attributes 


smil 


head, body 


COMMON-ATTRS, CONTCTRL-ATTRS, xmlns 


head 


layout, switch, meta, 
metadata 


COMMON-ATTRS 


body 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS 


layout 


root-layout, region 


COMMON-ATTRS, CONTCTRL-ATTRS, type 


root-layout 


EMPTY 


COMMON-ATTRS, backgroundColor, height, width, skip- 
content 


region 


EMPTY 


COMMON-ATTRS, backgroundColor, bottom, fit, height, left, 

right, showBackground, top, width, z-index, skip-content, 

regionName 


ret, animation, audio, img, 
video, text, textstream 


area 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 

repeat, region, MEDIA-ATTRS, clipBegin(clip-begin), 

clipEnd(clip-end), alt, longDesc, readlndex, abstract, 

author, copyright 


a 


MEDIA-ELMS 


COMMON-ATTRS, LINKING-ATTRS 


area 


EMPTY 


COMMON-ATTRS, LINKING-ATTRS, TIMING-ATTRS, repeat, 
shape, coords, nohref 


par, seq 


TIMING-ELMS, 

MEDIA-ELMS, 

switch, a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS, TIMING-ATTRS, 
repeat 


switch 


TIMING-ELMS, 

MEDIA-ELMS, layout, 

a, prefetch 


COMMON-ATTRS, CONTCTRL-ATTRS 


prefetch 


EMPTY 


COMMON-ATTRS, CONTCTRL-ATTRS, mediaSize, 
mediaTime, bandwidth, src, skip-content 


meta 


EMPTY 


COMMON-ATTRS, content, name, skip-content 


metadata 


EMPTY 


COMMON-ATTRS, skip-content 



Interchange format for MMS 



9.1 



General 



The MPEG-4 file format [34] is mandated in [35] to be used for continuous media along the entire delivery chain 
envisaged by the MMS, independent on whether the final delivery is done by streaming or download, thus enhancing 
interoperability. 

In particular, the following stages are considered: 

upload from the originating terminal to the MMS proxy; 

file exchange between MMS servers; 

transfer of the media content to the receiving terminal, either by file download or by streaming. In the first case 
the self-contained file is transferred, whereas in the second case the content is extracted from the file and 
streamed according to open payload formats. In this case, no trace of the file format remains in the content that 
goes on the wire/in the air. 

Additionally, the MPEG-4 file format can be used for the storage in the servers and the "hint track" mechanism can be 
used for the preparation for streaming. 

The clause 9.2 of the present document gives the necessary requirements to follow for the MPEG-4 file format used in 
MMS. These requirements will guarantee PSS to interwork with MMS as well as the MPEG-4 file format to be used 
internally within the MMS system. For PSS servers not interworking with MMS there is no requirement to follow these 
guidelines. 
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9.2 MPEG-4 file format guidelines 

9.2.1 Registration of non-ISO codecs 

How to include the non-ISO code streams AMR narrow-band speech and H.263 encoded video in MP4 files is 
described in annex D of the present document. 

9.2.2 Hint tracks 

The hint tracks are a mechanism that the server implementation may choose to use in preparation for the streaming of 
media content contained in MP4 files. However, it should be observed that the usage of the hint tracks is an internal 
implementation matter for the server, and it falls outside the scope of the present document. 

9.2.3 Self-contained MP4 files 

All media in the MP4 file shall be self-contained, i.e. there shall not be referencing to external media data from inside 
the MP4 file. 

9.2.4 MPEG-4 systems specific elements 

Tracks relative to MPEG-4 system architectural elements (e.g. BIFS scene description tracks or OD Object descriptors) 
are optional and shall be ignored. The adoption of the MPEG-4 file format does not imply the usage of MPEG-4 
systems architecture. The receiving terminal is not required to implement any of the specific MPEG-4 system 
architectural elements. 
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Annex A (informative): 
Protocols 

A.1 SDP 

This clause gives some background information on SDP. 

Table A.l provides an overview of the different SDP fields that can be identified in a SDP file. 

Table A.1 : Overview of fields in SDP 



Type 


Description 


Requirement 
according to [6] 


Requirement 

according to 

the present 

document 


Session Description 


V 


Protocol version 


R 


R 





Owner/creator and session identifier 


R 


R 


S 


Session Name 


R 


R 


1 


Session information 








U 


URI of description 








E 


Email address 








P 


Phone number 








C 


Connection Information 


R 


R 


B 


Bandwidth 
information 


AS 





R 


Z 


Time zone adjustments 








K 


Encryption key 








A 


Session attributes 


control 





R 


range 





R 


Time Description 


T 


Time the session is active 


R 


R 


R 


Repeat times 








IVIedia Description 


M 


IVIedia name and transport address 


R 


R 


1 


IVIedia title 








C 


Connection information 


R 


R 


B 


Bandwidth 
information 


AS 





R 


K 


Encryption Key 








A 


Attribute Lines 


control 





R 


range 





R 


fmtp 





R 


rtpmap 





R 


Note 1 : R = Required, = Optional 

Note 2: The "c" type is only required on the session level if not present on the media level. 

Note 3: The "c" type is only required on the media level if not present on the session level. 



The example below shows an SDP file that could be sent to a PSS client to initiate unicast streaming of a H.263 video 
sequence. 
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EXAMPLE: v=0 

o=ghost 2890844526 2890842807 IN IP4 192.168.10.10 

s=3GPP Unicast SDP Example 

i=Example of Unicast SDP file 

u=http://www.infoserver.com/ae600 

e=ghost@mailserver.com 

c=INIP4 192.168.30.29 

a=range:npt=0-45.678 

b=AS:128 

t=0 

m=video 1024 RTP/AVP 96 

a=rtpmap:96 H26 3 -2000/90000 

a=fmtp:96 profile=3;level=10 

a=control:rtsp;//mediaserver.com/movie 

a=recvonly 

b=AS:128 



A.2 RTSP 



The example below is intended to give some more understanding of how RTSP and SDP are used within the 3GPP PSS. 
The example assumes that the streaming client has the RTSP URL to a presentation consisting of an H.263 video 
sequence and AMR speech. RTSP messages sent from the client to the server are in bold and messages from the server 
to the client in italic. In the example the server provides aggregate control of the two streams. 



EXAMPLE: 



DESCRIBE rtsp://mediaserver.coin/movie.test RTSP/1.0 
CSeq: 1 



RTSP/1.0 200 OK 

CSeq: 1 

Content-Type: application/sdp 

Content-Length: 203 

v=0 

o=- 950814089 950814089 IN IP 4 144.132.134.67 

s=Example of aggregate control of AMR speech and H.263 video 

c=INIP4 192.168.30.29 

a=range:npt=0-59.3478 

a=control:* 

b=AS:77 

t=0 

m=audio RTP/AVP 97 

a=rtpmap:97 AMR/8000 

a=fintp:97 mode-set=0,2,5,7; maxframes=l 

a=control:streamID=0 

b=AS:13 

m=video RTP/AVP 98 

a=rtpmap:98 H263 -2000/90000 

a=fintp:98 profile=3;level=10 

a=control: streamID=l 

b=AS:64 



SETUP rtsp://mediaserver.coin/movie.test/streamID=0 RTSP/1.0 

CSeq: 2 

Transport: RTP/AVP/UDP;unicast;client_port=3456-3457 
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RTSP/1.0 200OK 

CSeq: 2 

Transport: RTP/AVP/UDP;unicast;client_port=3456-3457; server_port=5678-5679 

Session: dfhyrio90Uk 



SETUP rtsp://mediaserver.coin/movie.test/streamID=l RTSP/1.0 
CSeq: 3 

Transport: RTF/A VP/UDP;unicast;client_port=3458-3459 
Session: dfhyrio9011k 



RTSP/1.0 200 OK 

CSeq: 3 

Transport: RTP/AVP/UDP;unicast;clientj)ort=3458-3459; server j>ort=5680-5681 

Session: dfhyrio90llk 



PLAY rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 4 

Session: dfhyrio9011k 



RTSP/1.0 200 OK 

CSeq: 4 

Session: dfhyrioQOllk 

Range: npt=0- 

RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; seq=9900093;rtptime=4470048, 

url= rtsp.V/mediaserver.com/movie. test/streamID=l ; seq = 1 004096 ;rtptime= 1070549 



The user watches the movie for 20 seconds and then decides to fast forward to 10 seconds before 
the end... 

PAUSE rtsp://mediaserver.com/movie.test RTSP/1.0 

CSeq: 5 

Session: dfhyrio9011k 



PLAY rtsp://mediaserver.coni/movie.test RTSP/1.0 

CSeq: 6 

Range: npt=50-59.3478 

Session: dfhyrio9011k 



RTSP/1.0 200 OK 

CSeq: 5 

Session: dfhyrio90llk 

RTSP/1.0 200 OK 

CSeq: 6 

Session: dfhyrio90llk 

Range: npt=50-59.3478 

RTP-Info: url= rtsp://mediaserver.com/movie.test/streamID=0; 

seq=39900043;rtptime=44470648, url= rtsp://mediaserver.com/movie.test/streamID=l; 

seq=31 004046; rtptime=4 1090349 



After the movie is over the client issues a TEARDOWN to end the session. . . 
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TEARDOWN rtsp://mediaserver.coin/movie.test RTSP/1.0 

CSeq: 7 

Session: dfhyrio9011k 

RTSP/1.0 200 OK 
Cseq: 7 

Session: dfhyrioQOllk 
Connection: close 
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Annex B (informative): 
SMIL authoring guidelines 

B.1 General 

This is an informative annex for SMIL presentation authors. Authors can expect that PSS cHents can handle the SMIL 
module collection defined in clause 8.2, with the restrictions defined in this Annex. When creating SMIL documents the 
author is recommended to consider that terminals may have small displays and simple input devices. The media types 
and their encoding included in the presentation should be restricted to what is described in clause 7 of the present 
document. Considering that many mobile devices may have limited software and hardware capabilities, the number of 
media to be played simultaneous should be limited. For example, many devices will not be able to handle more than one 
video sequence at the time. 



B.2 BasicLinking 



The Linking Modules define elements and attributes for navigational hyperlinking, either through user interaction or 
through temporal events. The BasicLinking module defines the a and area elements for basic linking: 

a Similar to the "a" element in HTML it provides a link from a media object through the href attribute (which 

contains the URI of the link's destination). The "a" element includes a number of attributes for defining the 
behaviour of the presentation when the link is followed. 

area Whereas the a element only allows a link to be associated with a complete media object, the area element 
allows links to be associated with spatial and/or temporal portions of a media object. 

The area element may be useful for enabling services that rely on interactivity where the display size is not big enough 
to allow the display of links alongside a media (e.g. QCIF video) window. Instead, the user could, for example, click on 
a watermark logo displayed in the video window to visit the company website. 

Even if the area element may be useful some mobile terminals will not be able to handle area elements that include 
multiple selectable regions within an area element. One reason for this could be that the terminals do not have the 
appropriate user interface. Such area elements should therefore be avoided. Instead it is recommended that the "a" 
element be used. If the "area" element is used, the SMIL presentation should also include alternative links to navigate 
through the presentation; i.e. the author should not create presentations that rely on that the player can handle "area 
elements. 



B.3 BasicLayout 

The "fit" attribute defines how different media should be fitted into their respective display regions. 

The rendering and layout of some objects on a small display might be difficult and all mobile devices may not support 
features such as scroll bars; in addition, the root-layout window may represent the full screen of the display. Therefore 
"fit=scroir' should not be used. 

Due to hardware restrictions in mobile devices, operations such that scaling of a video sequence, or even images, may 
be very difficult to achieve. According to the SMIL 2.0 specification SMIL players may in these situations clip the 
content instead. To be sure of that the presentation is displayed as the author intended, content should be encoded in a 
size suitable for the terminals intended and it is recommended to use "fit=hidden". 
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B.4 EventTiming 



The two attributes "endEvent" and "repeatEvent" in the EventTiming module may cause problems for a mobile SMIL 
player. The end of a media element triggers the "endEvent". In the same way the "repeatEvent" occurs when the second 
and subsequent iterations of a repeated element begin playback. Both these events rely on that the SMIL player receives 
information about that the media element has ended. One example could be when the end of a video sequence initiates 
the event. If the player has not received explicit information about the duration of the video sequence, e.g. by the "dur" 
attribute in SMIL or by some external source as the "a=range" field in SDP. The player will have to rely on the RTCP 
BYE message to decide when the video sequence ends. If the RTCP BYE message is lost, the player will have problems 
initiate the event. For these reasons is recommended that the "endEvent" and "repeatEvent" attributes are used with 
care, and if used the player should be provided with some additional information about the duration of the media 
element that triggers the event. This additional information could e.g. be the "dur" attribute in SMIL or the "a=range" 
field in SDP. 

The "inBoundsEvent" and "outOfBoundsEvent" attributes assume that the terminal has a pointer device for moving the 
focus to within a window (i.e. clicking within a window). Not all terminals will support this functionality since they do 
not have the appropriate user interface. Hence care should be taken in using these particular event triggers. 



B.5 Metal nformation 



Authors are encouraged to make use of meta data whenever providing such information to the mobile terminal appears 
to be useful. However, they should keep in mind that some mobile terminals will parse but not process the meta data. 

Furthermore, authors should keep in mind that excessive use of meta data will substantially increase the file size of the 
SMIL presentation that needs to be transferred to the mobile terminal. This may result in longer set-up times. 



B.6 XML entities 

Entities are a mechanism to insert XML fragments inside an XML document. Entities can be internal, essentially a 
macro expansion, or external. Use of XML entities in SMIL presentations is not recommended, as many current XML 
parsers do not fully support them. 



B.7 XHTML Basic 



When rendering texts in a SMIL presentation, authors are able to use XHTML Basic that contains eleven modules. 
However, some of the modules include non-text information. When referring to an XHTML Basic document from a 
SMIL document, authors should use only the required XHTML Host Language modules : Structure Module, Text 
Module, Hypertext Module and List Module. The use of the Image Module, in particular, should not be used. Images 
and other non-text contents should be included in the SMIL document. 

Note: An XHTML file Including a module which is not part of the XHTML Host Language modules may not be 
shown as intended. 
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Annex C (normative): 
MIME media types 

C.1 MIME media type H263-2000 

MIME media type name: video 
MIME subtype name: H263-2000 

Required parameters: None 

Optional parameters: 

profile: H.263 profile number, in the range through 8, specifying the supported H.263 annexe s/subp arts. 

level: Level of bitstream operation, in the range through 99, specifying the level of computational complexity of the 

decoding process. When no profile and level parameters are specified. Baseline Profile (Profile 0) level 10 are the 

default values. 

The profile and level specifications can be found in [23]. Note that the RTP payload format for H263-2000 is the same 
as for H263-1998 and is defined in [14], but additional annexes/subparts are specified along with the profiles and levels. 

NOTE: The above text will be replaced with a reference to the RFC describing the H263-2000 MIME media type 
as soon as this becomes available. 



C.2 MIME media type xhtml+xml 



MIME media type name: application 
MIME subtype name: xhtml+xml 

Required parameters: none 

Optional parameters: 

charset: This parameter has identical semantics to the charset parameter of the "application/xml" media type as specified 

in [16]. 

NOTE: The above text will be replaced with a reference to the RFC describing the xhtml+xml MIME media type 
as soon as this becomes available. 
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Annex D (normative): 

Support for non-ISO code streams in MP4 files 



D.1 General 



The purpose of this annex is to define the necessary structure for integration of the H.263 and AMR media specific 
information in an MP4 file. Clauses D.2 to D.4 give some background information about the Sample Description atom, 
VisualSampleEntry atom and the AudioSampleEntry atom in the MPEG-4 file format. Then, the definitions of the 
SampleEntry atoms for AMR and H.263 are given in clauses D.5 to D.8. 

AMR data is stored in the stream according to clause 8 of [1 1]. 



D.2 Sample Description atom 



In an MP4 file. Sample Description Atom gives detailed information about the coding type used, and any initialisation 
information needed for that coding. The Sample Description Atom can be found in the MP4 Atom Structure Hierarchy 
shown in figure D.l. 



Movie Atom 



Track Atom 



Media Atom 



Media Information Atom 



Sample Table Atom 



Sample Description Atom 



Figure D.1 : MP4 Atom Structure Hierarchy 

The Sample Description Atom can have one or more SampleDescriptionEntry fields. Valid Sample Description Entry 
atoms already defined for MP4 are AudioSampleEntry, VideoSampleEntry, HintSampleEntry and MPEGSampleEntry 
Atoms. The Sample DescriptionEntry Atoms for AMR and H.263 shall be AMRSampleEntry and H263SampleEntry, 
respectively. 
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The format of SampleDescriptionEntry and its fields are explained as follows: 
SampleDescriptionEntry ::= VisualSampleEntry I 

AudioSampleEntry I 

HintSampleEntry I 

MpegSampleEntry 

H263SampleEntry I 

AMRSampleEntry 

Table D.1 : SampleDescriptionEntry fields 



Field 


Type 


Details 


Value 


VisualSampleEntry 




Entry type for visual samples defined 
in the MPEG-4 specification. 




AudioSampleEntry 




Entry type for audio samples defined 
in the MPEG-4 specification. 




HintSampleEntry 




Entry type for hint track samples 
defined in the MPEG-4 specification. 




MpegSampleEntry 




Entry type for MPEG related stream 
samples defined in the MPEG-4 
specification. 




H263SampleEntry 




Entry type for H.263 visual samples 
defined in clause D.6 of the present 
document. 




AMRSampleEntry 




Entry type for AMR speech samples 
defined in clause D.5 of the present 
document. 





From the above 5 atoms, only the VisualSampleEntry, AudioSampleEntry, H263SampleEntry and AMRSampleEntry 
atoms are taken into consideration, since MPEG specific streams and hint tracks are out of the scope of the present 
document. 



D.3 VisualSampleEntry atom 

The VisualSampleEntry Atom is defined as follows: 
VisualSampleEntry : : = AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_2 

Reserved_32 

Reserved_2 

Reserved 2 
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ESDAtom 



Table D.2: VisualSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'mp4v' 


Reserved_6 


Unsigned 
int(8) 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserverl 6 


Const 

unsigned 

int(32) 







Reserved_4 


Const 

unsigned 

int(32) 




0x014000f0 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned int{8) 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


ESDAtom 




Elementary stream descriptor for this 
stream. 





The stream type specific information is in the ESDAtom structure, which will be explained later. 

D.4 AudioSampleEntry atom 

AudioSampleEntryAtom is defined as follows: 
AudioSampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 
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Table D.3: AudioSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'mp4a' 


Reserved_6 


Unsigned 
int(8) 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_8 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(16) 


Copied from track 




Reserved_2 


Const 

unsigned 

int(16) 







ESDAtom 




Elementary stream descriptor for this 
stream. 





The stream type specific information is in the ESDAtom structure, which will be explained later. 

D.5 AMRSampleEntry atom 

The atom type of the AMRSampleEntry Atom shall be 'samr'. 
The AMRSampleEntry Atom is defined as follows: 
AMRSampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_8 

Reserved_2 

Reserved_2 

Reserved_4 

TimeScale 

Reserved_2 

DecoderSpecificInfo 
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Table D.4: AM RSampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




'samr' 


Reserved_6 


Unsigned 
int(8) 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserved_8 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




2 


Reserved_2 


Const 

unsigned 

int(16) 




16 


Reserved_4 


Const 

unsigned 

int(32) 







TimeScale 


Unsigned 
int(16) 


Copied from media header atom of 
this media 




Reserved_2 


Const 

unsigned 

int(16) 







DecoderSpecificlnfo 




Information specific to the decoder. 





If one compares the Audio SampleEntry Atom - AMRSampleEntry Atom the main difference is in the replacement of 
the ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for AMR. The DecoderSpecificlnfo field 
structure for AMR is described in clause D.7. 



D.6 H263SampleEntry atom 

The atom type of the H263SampleEntry Atom shall be 's263'. 
The AMRSampleEntry Atom is defined as follows: 
H263SampleEntry ::= AtomHeader 

Reserved_6 

Data-reference-index 

Reserved_16 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_4 

Reserved_2 

Reserved_32 

Reserved_2 
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Reserved_2 
DecoderSpecificInfo 



Table D.5: H263SampleEntry fields 



Field 


Type 


Details 


Value 


AtomHeader.Size 


Unsigned 
int(32) 






AtomHeader.Type 


Unsigned 
int(32) 




's263' 


Reserved_6 


Unsigned 
int(8) 







Data-reference-index 


Unsigned 
int(16) 


Index to a data reference that to use 
to retrieve the sample data. Data 
references are stored in data 
reference Atoms. 




Reserverl 6 


Const 

unsigned 

int(32) 







Reserved_4 


Const 

unsigned 

int(32) 




0x014000f0 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 




0x00480000 


Reserved_4 


Const 

unsigned 

int(32) 







Reserved_2 


Const 

unsigned 

int(16) 




1 


Reserved_32 


Const 
unsigned int{8) 







Reserved_2 


Const 

unsigned 

int(16) 




24 


Reserved 2 


Constint(16) 




-1 


DecoderSpecificInfo 




Information specific to the decoder. 





If one compares the VisualSampleEntry - H263SampleEntry Atom the main difference is in the replacement of the 
ESDAtom, which is specific to MPEG-4 systems, with an atom suitable for H.263. The DecoderSpecificInfo field 
structure for H.263 is described in clause D.8. 



D.7 DecoderSpecificInfo field for AlVIRSampleEntry atom 

The DecoderSpecificInfo fields for AMR shall be as defined in table D.6. The DecoderSpecificInfo for the 
AMRS ample En try Atom shall always be included if the MP4 file contains AMR media. 

Table D.6: The DecoderSpecificInfo fields for AMRSampleEntry 



Field 


Type 


Details 


Value 


DecSpecificlnfoTag 


Bit(8) 




0x05 


SizeOfDecSpecificlnfo 


Unsigned int(32) 






DecSpecificlnfo 


AMRDecSpecStruc 


Structure which holds the AMR 
Specific information 





DecSpecificlnfoTag: identifies that this is a DecoderSpecificInfo Field. It must be set to 0x05. 
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SizeOfDecSpecificInfo: defines the size (in Bytes) of the DecSpecificInfo structure following. 
DecSpecificInfo: the structure where the AMR stream specific information resides. 
The AMRDecSpecStruc is defined as follows: 
struct AMRDecSpecStruc { 

Unsigned int (32) vendor 

Unsigned int (8) decoder_version 

Unsigned int (16) mode_set 

Unsigned int (8) mode_change_period 

Unsigned int (8) frames_per_sample 

} 

The definitions of AMRDecSpecStruc members are as follows: 

vendor: four character code of the manufacturer of the codec, e.g. 'VXYZ'. 

decoder_version: version of the decoder which created the AMR stream being stored, the value is set to if version 
has no importance. 

mode_set: the active codec modes. A value of OxlF means all modes are possibly present in the AMR stream. Each bit 
of the mode_set parameter corresponds to one mode. The bit index of the mode is calculated according to the 4 bit FT 
field of the AMR frame structure. The mapping of existing AMR modes to FT is given in table l.a in [19]. The 
mode_set bit structure is as follows: (B15xxxxxxB8B7xxxxxxB0) where BO (Least Significant Bit) corresponds to 
Mode 0, and B8 corresponds to Mode 8. As an example, if mode_set = 00000001 10010101b, only AMR Modes 0, 2, 4, 
7 and 8 are present in the AMR stream. 

mode_change_period: defines a number N, which restricts the mode changes only at a multiple of N frames. If no 
restriction is applied, this value should be set to 0. If mode_change_period is not 0, the following restrictions apply to it 
according to the frames_per_sample field: 

if (mode_change_penod < frames_per_sample) 

frames_per_sample = k x (mode_change_period) 
else if (mode_change_period > frames_per_sample) 

mode_change_period = k x (frames_per_sample) 

where k : integer [2, ...] 

If mode_change_period is equal to frames_per_sample, then AMR mode is the same for all frames inside one sample. 

franies_per_sample: defines the number of frames to be considered as 'one sample' inside the MP4 file. This number 
should be greater than 0. A value of 1 means each frame is treated as one sample. A value of 10 means that 10 AMR 
frames (of duration 20 msec each) are put together and treated as one sample. It must be noted that, in this case, one 
sample duration is 20 (msec/frame) x 10 (frame) = 200 msec. For the last sample of the AMR stream, the number of 
frames can be smaller than frames_per_sample, if the number of remaining frames is smaller than frames_per_sample. 

NOTE: The "hinter", for the creation of the hint tracks, can use the information given by the AMRDecSpecStruc 
members. 



D.8 DecoderSpecificlnfo field for H263SampleEntry atom 

The DecoderSpecificlnfo fields for H. 263 shall be as defined in table D.7. The DecoderSpecificlnfo for the 
H263SampleEntry Atom shall always be included if the MP4 file contains H.263 media. 

The DecoderSpecificlnfo for H263 is composed of the following fields. 
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Table D.7: The DecoderSpecificlnfo fields H263SampleEntry 



Field 


Type 


Details 


Value 


DecSpecificlnfoTag 


Bit(8) 




0x05 


SizeOfDecSpecificlnfo 


Unsigned int(32) 






DecSpecificlnfo 


H263DecSpecStruc 


Structure which holds the 
H.263 Specific information 





DecSpecificlnfoTag: It identifies that this is a DecoderSpecificlnfo field. It shall be set to 0x05. 
SizeOfDecSpecificlnfo: It defines the size (in Bytes) of the DecSpecificlnfo structure following. 
DecSpecificlnfo: This is the structure where the H263 stream specific information resides. 
H263DecSpecStruc is defined as follows: 
struct H263DecSpecStruc{ 

Unsigned int (32) vendor 

Unsigned int (8) decoder_version 

Unsigned int (8) H263_Level 

Unsigned int (8) H263_Profile 

Unsigned int (16) max_width 

Unsigned int (16) max_height 

} 

The definitions of H263DecSpecStruc members are as follows: 

vendor: Four character code of the manufacturer of the codec, e.g. 'VXYZ'. 

decoder_version: Version of the decoder which created the H263 stream being stored. This value is set to if version 
has no importance. 

H263_Level and H263_Profile: These two parameters define which H263 profile and level is used. These parameters 
are based on the MIME media type video/H263-2000. The profile and level specifications can be found in [23]. 

EXAMPLE 1: H.263 Baseline = {H263_Level = 10, H263_Profile = 0} 

EXAMPLE 2: H.263 Profile 3 @ Level 10 = {H263_Level = 10 , H263_Profile = 3 } 

niax_width: The maximum width of encoded image. 

niax_height: The maximum height of encoded image. 

NOTE 1 : max_width and max_height parameters together may be used to allocate the necessary memory in the 
playback device without need to analyse the H.263 stream. 

NOTE 2: The "hinter", for the creation of the hint tracks, can use the information given by the H263DecSpecStruc 

members. 
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